Handling missing probe-sets in a linear gene classifier
نویسنده
چکیده
In multiple myeloma (a type of cancer) linear models (e.g. EMC92) can be used to estimate a patients’ prognosis. In case the model outcome exceeds a specific value t (i.e. a dichotomizing threshold) the subject is classified as highrisk. These linear models consist of a number of covariates, each contributing to the model outcome. In the case of EMC92 or UAMS70, these covariates are probe sets. This vignette deals with the situation that not all covariates are available for generating the model outcome in independent data. The method is based on redistributing the weights of the discarded covariates over the remaining covariates based on the covariance structure in the training data of that model, i.e. a reweighted model.
منابع مشابه
Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملA study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: The good synergy between RBFNs and EventCovering method
The presence of Missing Values in a data set can affect the performance of a classifier constructed using that data set as a training sample. Several methods have been proposed to treat missing data and the one used more frequently is the imputation of the Missing Values of an instance. In this paper, we analyze the improvement of performance on Radial Basis Function Networks by means of the us...
متن کاملمقایسه روش الگوریتم EM و روشهای متداول جانهی دادههای گمشده: مطالعهروی پرسشنامه خوددرمانی بیماران دیابتی
Background and Objectives: Missing data is a big challenge in the research. According to the type of the study and of the variables, different ways have been proposed to work with these data. This study compared five popular imputation approaches in addressing missing data in the questionnaires. Methods: In this study, 500 questionnaires were used for self-medication in diabetic patients. Mi...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملGene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method
Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017